46 research outputs found

    Software effort estimation based on optimized model tree

    Full text link
    Background: It is widely recognized that software effort estimation is a regression problem. Model Tree (MT) is one of the Machine Learning based regression techniques that is useful for software effort estimation, but as other machine learning algorithms, the MT has a large space of configuration and requires to carefully setting its parameters. The choice of such parameters is a dataset dependent so no general guideline can govern this process which forms the motivation of this work. Aims: This study investigates the effect of using the most recent optimization algorithm called Bees algorithm to specify the optimal choice of MT parameters that fit a dataset and therefore improve prediction accuracy. Method: We used MT with optimal parameters identified by the Bees algorithm to construct software effort estimation model. The model has been validated over eight datasets come from two main sources: PROMISE and ISBSG. Also we used 3-Fold cross validation to empirically assess the prediction accuracies of different estimation models. As benchmark, results are also compared to those obtained with Stepwise Regression Case-Based Reasoning and Multi-Layer Perceptron. Results: The results obtained from combination of MT and Bees algorithm are encouraging and outperforms other well-known estimation methods applied on employed datasets. They are also interesting enough to suggest the effectiveness of MT among the techniques that are suitable for effort estimation. Conclusions: The use of the Bees algorithm enabled us to automatically find optimal MT parameters required to construct effort estimation models that fit each individual dataset. Also it provided a significant improvement on prediction accuracy

    Analogy-based effort estimation: a new method to discover set of analogies from dataset characteristics

    Full text link
    Analogy-based effort estimation (ABE) is one of the efficient methods for software effort estimation because of its outstanding performance and capability of handling noisy datasets. Conventional ABE models usually use the same number of analogies for all projects in the datasets in order to make good estimates. The authors' claim is that using same number of analogies may produce overall best performance for the whole dataset but not necessarily best performance for each individual project. Therefore there is a need to better understand the dataset characteristics in order to discover the optimum set of analogies for each project rather than using a static k nearest projects. Method: We propose a new technique based on Bisecting k-medoids clustering algorithm to come up with the best set of analogies for each individual project before making the prediction. Results & Conclusions: With Bisecting k-medoids it is possible to better understand the dataset characteristic, and automatically find best set of analogies for each test project. Performance figures of the proposed estimation method are promising and better than those of other regular ABE model

    Analyzing the Relationship between Project Productivity and Environment Factors in the Use Case Points Method

    Full text link
    Project productivity is a key factor for producing effort estimates from Use Case Points (UCP), especially when the historical dataset is absent. The first versions of UCP effort estimation models used a fixed number or very limited numbers of productivity ratios for all new projects. These approaches have not been well examined over a large number of projects so the validity of these studies was a matter for criticism. The newly available large software datasets allow us to perform further research on the usefulness of productivity for effort estimation of software development. Specifically, we studied the relationship between project productivity and UCP environmental factors, as they have a significant impact on the amount of productivity needed for a software project. Therefore, we designed four studies, using various classification and regression methods, to examine the usefulness of that relationship and its impact on UCP effort estimation. The results we obtained are encouraging and show potential improvement in effort estimation. Furthermore, the efficiency of that relationship is better over a dataset that comes from industry because of the quality of data collection. Our comment on the findings is that it is better to exclude environmental factors from calculating UCP and make them available only for computing productivity. The study also encourages project managers to understand how to better assess the environmental factors as they do have a significant impact on productivityComment: Journal of Software: Evolution and Process, 201

    Model tree based adaption strategy for software effort estimation by analogy

    Full text link
    Background: Adaptation technique is a crucial task for analogy based estimation. Current adaptation techniques often use linear size or linear similarity adjustment mechanisms which are often not suitable for datasets that have complex structure with many categorical attributes. Furthermore, the use of nonlinear adaptation technique such as neural network and genetic algorithms needs many user interactions and parameters optimization for configuring them (such as network model, number of neurons, activation functions, training functions, mutation, selection, crossover, ... etc.). Aims: In response to the abovementioned challenges, the present paper proposes a new adaptation strategy using Model Tree based attribute distance to adjust estimation by analogy and derive new estimates. Using Model Tree has an advantage to deal with categorical attributes, minimize user interaction and improve efficiency of model learning through classification. Method: Seven well known datasets have been used with 3-Fold cross validation to empirically validate the proposed approach. The proposed method has been investigated using various K analogies from 1 to 3. Results: Experimental results showed that the proposed approach produced better results when compared with those obtained by using estimation by analogy based linear size adaptation, linear similarity adaptation, 'regression towards the mean' and null adaptation. Conclusions: Model Tree could form a useful extension for estimation by analogy especially for complex data sets with large number of categorical attributes

    A Comparative Study for Predicting Heart Diseases Using Data Mining Classification Methods

    Full text link
    Improving the precision of heart diseases detection has been investigated by many researchers in the literature. Such improvement induced by the overwhelming health care expenditures and erroneous diagnosis. As a result, various methodologies have been proposed to analyze the disease factors aiming to decrease the physicians practice variation and reduce medical costs and errors. In this paper, our main motivation is to develop an effective intelligent medical decision support system based on data mining techniques. In this context, five data mining classifying algorithms, with large datasets, have been utilized to assess and analyze the risk factors statistically related to heart diseases in order to compare the performance of the implemented classifiers (e.g., Na\"ive Bayes, Decision Tree, Discriminant, Random Forest, and Support Vector Machine). To underscore the practical viability of our approach, the selected classifiers have been implemented using MATLAB tool with two datasets. Results of the conducted experiments showed that all classification algorithms are predictive and can give relatively correct answer. However, the decision tree outperforms other classifiers with an accuracy rate of 99.0% followed by Random forest. That is the case because both of them have relatively same mechanism but the Random forest can build ensemble of decision tree. Although ensemble learning has been proved to produce superior results, but in our case the decision tree has outperformed its ensemble version

    An empirical evaluation of ensemble adjustment methods for analogy-based effort estimation

    Full text link
    Objective: This paper investigates the potential of ensemble learning for variants of adjustment methods used in analogy-based effort estimation. The number k of analogies to be used is also investigated. Method We perform a large scale comparison study where many ensembles constructed from n out of 40 possible valid variants of adjustment methods are applied to eight datasets. The performance of each method was evaluated based on standardized accuracy and effect size. Results: The results have been subjected to statistical significance testing, and show reasonable significant improvements on the predictive performance where ensemble methods are applied. Conclusion: Our conclusions suggest that ensembles of adjustment methods can work well and achieve good performance, even though they are not always superior to single methods. We also recommend constructing ensembles from only linear adjustment methods, as they have shown better performance and were frequently ranked higher

    Learning best K analogies from data distribution for case-based software effort estimation

    Full text link
    Case-Based Reasoning (CBR) has been widely used to generate good software effort estimates. The predictive performance of CBR is a dataset dependent and subject to extremely large space of configuration possibilities. Regardless of the type of adaptation technique, deciding on the optimal number of similar cases to be used before applying CBR is a key challenge. In this paper we propose a new technique based on Bisecting k-medoids clustering algorithm to better understanding the structure of a dataset and discovering the the optimal cases for each individual project by excluding irrelevant cases. Results obtained showed that understanding of the data characteristic prior prediction stage can help in automatically finding the best number of cases for each test project. Performance figures of the proposed estimation method are better than those of other regular K-based CBR methods.Comment: arXiv admin note: substantial text overlap with arXiv: 1703.0456

    Fuzzy Model Tree For Early Effort Estimation

    Full text link
    Use Case Points (UCP) is a well-known method to estimate the project size, based on Use Case diagram, at early phases of software development. Although the Use Case diagram is widely accepted as a de-facto model for analyzing object oriented software requirements over the world, UCP method did not take sufficient amount of attention because, as yet, there is no consensus on how to produce software effort from UCP. This paper aims to study the potential of using Fuzzy Model Tree to derive effort estimates based on UCP size measure using a dataset collected for that purpose. The proposed approach has been validated against Treeboost model, Multiple Linear Regression and classical effort estimation based on the UCP model. The obtained results are promising and show better performance than those obtained by classical UCP, Multiple Linear Regression and slightly better than those obtained by Tree boost model

    v-SVR Polynomial Kernel for Predicting the Defect Density in New Software Projects

    Full text link
    An important product measure to determine the effectiveness of software processes is the defect density (DD). In this study, we propose the application of support vector regression (SVR) to predict the DD of new software projects obtained from the International Software Benchmarking Standards Group (ISBSG) Release 2018 data set. Two types of SVR (e-SVR and v-SVR) were applied to train and test these projects. Each SVR used four types of kernels. The prediction accuracy of each SVR was compared to that of a statistical regression (i.e., a simple linear regression, SLR). Statistical significance test showed that v-SVR with polynomial kernel was better than that of SLR when new software projects were developed on mainframes and coded in programming languages of third generationComment: 6 pages, accepted at Special Session: ML for Predictive Models in Eng. Applications at the 17th IEEE International Conference on Machine Learning and Applications, 17th IEEE ICMLA 201

    A Comparison Between Decision Trees and Decision Tree Forest Models for Software Development Effort Estimation

    Full text link
    Accurate software effort estimation has been a challenge for many software practitioners and project managers. Underestimation leads to disruption in the projects estimated cost and delivery. On the other hand, overestimation causes outbidding and financial losses in business. Many software estimation models exist; however, none have been proven to be the best in all situations. In this paper, a decision tree forest (DTF) model is compared to a traditional decision tree (DT) model, as well as a multiple linear regression model (MLR). The evaluation was conducted using ISBSG and Desharnais industrial datasets. Results show that the DTF model is competitive and can be used as an alternative in software effort prediction.Comment: 3rd International Conference on Communications and Information Technology (ICCIT), Beirut, Lebanon, pp. 220-224, 201
    corecore